[Dashboard] Remove gpustats dependencies from Ray[default] #41044

jonathan-anyscale · 2023-11-09T06:49:28Z

Why are these changes needed?

Add method to get gpu utilization similarly on how gpustats did, and remove gpustats from ray[default] dependencies.

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jjyao

lg. Could you update the PR description?

dashboard/client/src/pages/node/index.tsx

dashboard/modules/reporter/reporter_agent.py

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

dashboard/client/src/type/node.d.ts

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

dashboard/modules/reporter/reporter_agent.py

src/ray/raylet/scheduling/cluster_resource_scheduler.cc

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jjyao · 2023-11-16T00:59:15Z

@ericl needs your approval here as code owner.

Context: we are removing GPUtil and gpustats dependencies with a vendored in pyvnml single file library (the offical nvidia library that other libraries use internally) so that it works with minimal ray and it also removes unnecessary transitive dependencies included by gpustats for terminal display.

dashboard/modules/reporter/reporter_agent.py

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

XuehaiPan · 2023-11-17T09:18:56Z

dashboard/modules/reporter/reporter_agent.py

+                    memory_used=int(pynvml.nvmlDeviceGetMemoryInfo(gpu_handle).used)
+                    // MB,
+                    memory_total=int(pynvml.nvmlDeviceGetMemoryInfo(gpu_handle).total)
+                    // MB,


We could merge these two nvmlDeviceGetMemoryInfo calls into one.

FYI, in NVIDIA driver 510.39.01, a v2 memory info API was added:

https://github.com/NVIDIA/nvidia-settings/blob/510.39.01/src/nvml.h#L218-L241

The unversioned API (v1) and v2 API will return different results on R510+ drivers.

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jjyao · 2023-11-21T17:52:17Z

python/ray/_private/thirdparty/pynvml/__init__.py

Add a comment saying why we pick this version: something like we are using this version because it uses v2 api and supports a wider range of drivers.

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

ericl

Always great to see deps removed

…41044)" This reverts commit 9b9fb55.

…41044)" (#41375) Reverts #41044 premerge is busted and potentially blocking people from merging into branch cut. Revert to unblock Failing test: linux://python/ray/tests:test_streaming_generator_2 This reverts commit 9b9fb55.

…ct#41044) Add method to get gpu utilization similarly on how gpustats did, and remove gpustats from ray[default] dependencies. Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

…ay-project#41044)" (ray-project#41375) Reverts ray-project#41044 premerge is busted and potentially blocking people from merging into branch cut. Revert to unblock Failing test: linux://python/ray/tests:test_streaming_generator_2 This reverts commit 9b9fb55.

jonathan-anyscale mentioned this pull request Nov 9, 2023

[Core] Ray auto detect nvidia Gpu with pynvml #41020

Merged

8 tasks

remove gpustats from ray default

13fb433

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jonathan-anyscale force-pushed the remove_gpustats branch from 16adb6d to 13fb433 Compare November 13, 2023 22:38

jonathan-anyscale assigned jjyao Nov 13, 2023

jonathan-anyscale marked this pull request as ready for review November 13, 2023 22:52

jonathan-anyscale requested review from richardliaw, ericl and edoakes as code owners November 13, 2023 22:52

fix type class

48056cf

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jjyao reviewed Nov 14, 2023

View reviewed changes

dashboard/client/src/pages/node/index.tsx Outdated Show resolved Hide resolved

dashboard/modules/reporter/reporter_agent.py Outdated Show resolved Hide resolved

dashboard/modules/reporter/reporter_agent.py Outdated Show resolved Hide resolved

wookayin reviewed Nov 14, 2023

View reviewed changes

dashboard/modules/reporter/reporter_agent.py Outdated Show resolved Hide resolved

fix redundant

7d3c486

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jjyao approved these changes Nov 15, 2023

View reviewed changes

jjyao requested a review from wookayin November 15, 2023 16:51

jjyao reviewed Nov 15, 2023

View reviewed changes

dashboard/client/src/type/node.d.ts Show resolved Hide resolved

lint

b035611

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

wookayin reviewed Nov 15, 2023

View reviewed changes

dashboard/modules/reporter/reporter_agent.py Outdated Show resolved Hide resolved

dashboard/modules/reporter/reporter_agent.py Show resolved Hide resolved

src/ray/raylet/scheduling/cluster_resource_scheduler.cc Outdated Show resolved Hide resolved

nit fix

91e9725

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jjyao assigned ericl Nov 16, 2023

jjyao requested a review from wookayin November 16, 2023 00:59

wookayin approved these changes Nov 16, 2023

View reviewed changes

jjyao reviewed Nov 16, 2023

View reviewed changes

dashboard/modules/reporter/reporter_agent.py Show resolved Hide resolved

downgrade nvidia-ml-py to 11.510.69, using v2 api

f23603d

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

XuehaiPan reviewed Nov 17, 2023

View reviewed changes

decode pynvml bytes to str

1e0f5ad

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

jonathan-anyscale force-pushed the remove_gpustats branch from c2b4a50 to 1e0f5ad Compare November 20, 2023 18:04

jjyao approved these changes Nov 21, 2023

View reviewed changes

jonathan-anyscale added 2 commits November 21, 2023 10:06

nit

9a3332c

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

add pynvml version note

551c02d

Signed-off-by: Jonathan Nitisastro <jonathancn@anyscale.com>

ericl approved these changes Nov 22, 2023

View reviewed changes

jjyao merged commit 9b9fb55 into ray-project:master Nov 22, 2023
2 checks passed

can-anyscale added a commit that referenced this pull request Nov 24, 2023

Revert "[Dashboard] Remove gpustats dependencies from Ray[default] (#…

513914c

…41044)" This reverts commit 9b9fb55.

can-anyscale mentioned this pull request Nov 24, 2023

Revert "[Dashboard] Remove gpustats dependencies from Ray[default]" #41375

Merged

jonathan-anyscale deleted the remove_gpustats branch November 27, 2023 17:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dashboard] Remove gpustats dependencies from Ray[default] #41044

[Dashboard] Remove gpustats dependencies from Ray[default] #41044

jonathan-anyscale commented Nov 9, 2023 •

edited

Loading

jjyao left a comment

jjyao commented Nov 16, 2023

XuehaiPan Nov 17, 2023

XuehaiPan Nov 17, 2023

jjyao Nov 21, 2023

ericl left a comment

[Dashboard] Remove gpustats dependencies from Ray[default] #41044

[Dashboard] Remove gpustats dependencies from Ray[default] #41044

Conversation

jonathan-anyscale commented Nov 9, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

jjyao left a comment

Choose a reason for hiding this comment

jjyao commented Nov 16, 2023

XuehaiPan Nov 17, 2023

Choose a reason for hiding this comment

XuehaiPan Nov 17, 2023

Choose a reason for hiding this comment

jjyao Nov 21, 2023

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

jonathan-anyscale commented Nov 9, 2023 •

edited

Loading